0.1 Introduction

The goal of this study is to examine the impact of certain variables on the climate by examining the AQI of counties across the United States of America using data collected by the EPA.

There are two smaller sub studies in this presentation: One examining the effects of the Climate Alliance legislative program, and another examining the correlation between aspects of counties and the air quality.

0.2 Reading the Data and EDA

To begin we read the data in from the EPA datasets.

## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.

To get a sense of the AQI of US states over the ten year period, we plot a heatmap by time.

## [1] 85.3

A more accurate graphical representation by state can be obtained by plotting the AQI by year over several graphs of state.

The 6 most dangerous pollutants are ozone, nitrogen dioxide, sulfur dioxide, lead, carbon monoxide, and particulate matter.

These are the graphs of the pollutants over year. As shown in the graphs, the concentrations of each pollutant have decreased due to effective environmental regulation. The graphs show the 10th percentile, 90th percentile, and the mean concentrations of the p[pollutant.

The plots show that the concentrations have gradually decreased over time for this pollutant or in a few cases have remained the same.

Much of this improvement can be attributed to the Clean Air Act and its amendments in 1977 and 1990.

The 2011 reduction in lead concentration might be a result of changes to the Safe Drinking Water Act.

0.3 Climate Alliance

The Climate Alliance is a bipartisan group of governors who aim to reduce their emissions. Formed in response to the US withdrawal from the Paris Accord, it acts on the state level.

To determine the effectiveness of the Climate Alliance, we look if there is a significant difference in their rate of change of AQI within Climate Alliance states. A significant difference would indicate that the Climate Alliance was effective in improving the AQI of its. states. We look at data from 2019 (so thus the change in AQI from 2018 to 2019) to observe the effect after the formation.

## 
## Call:
## lm(formula = delta.aqi.state ~ is.climate.alli, data = mean.state.df %>% 
##     filter(Year == 2019))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.041  -1.541   0.631   2.125   4.818 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)          -0.110      0.667   -0.17     0.87
## is.climate.alliyes   -1.349      0.984   -1.37     0.18
## 
## Residual standard error: 3.47 on 48 degrees of freedom
## Multiple R-squared:  0.0377, Adjusted R-squared:  0.0177 
## F-statistic: 1.88 on 1 and 48 DF,  p-value: 0.177

While on average Climate Alliance states do improve faster, the difference is not significant at a 0.05 level. Despite this, there is a notable change in the average rate of change.

It is possible that the data we are using also might be obscuring some of the effects as not all states joined in 2017. The last state to join, Montana, joined in mid 2019. States like those make the impact of the Climate Alliance hard to see.

To be more precise, we look at only states who joined within the first month.

## 
## Call:
## lm(formula = delta.aqi.state ~ is.climate.alli.early, data = mean.state.df %>% 
##     filter(Year == 2019))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -17.38  -1.78   0.59   1.38   4.90 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                -0.191      0.570   -0.34     0.74  
## is.climate.alli.earlyyes   -1.927      1.078   -1.79     0.08 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.42 on 48 degrees of freedom
## Multiple R-squared:  0.0625, Adjusted R-squared:  0.043 
## F-statistic:  3.2 on 1 and 48 DF,  p-value: 0.08

The difference here is larger and more significant but is still insignificant at the 0.05 level.

While our study is does not conclude that the Climate Alliance is definitely improving the air quality of its member states, its members are improving slightly faster on average at a 75% confidence level.

0.4 County Level Effects on AQI

Using the data found by the USDA’s Economic Research Service, we look for predictors in counties to determine air quality and find correlations.

This begins by merging the 2019 AQI with the latest USDA ERS data. We use 2019 data to avoid skewing due to the 2020 West Coast fires.

To begin the analysis, we start by merging county data with AQI data. We start by merging all three sets of ERS county data, and then we merge by county and state.

Break the cleaned and merged dataset into X and Y for use with cv.glmnet. We use set.seed(1) for consistency.

Looking at the histogram shows that the county AQI data appears to be normal.

## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(select_cols)` instead of `select_cols` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Anova Table (Type II tests)
## 
## Response: med.aqi
##                                            Sum Sq  Df F value  Pr(>F)    
## UnempRate2020                                  26   1    0.25 0.61643    
## PctEmpChange1920                              119   1    1.14 0.28520    
## UnempRate2019                                 757   1    7.26 0.00719 ** 
## UnempRate2017                                 845   1    8.10 0.00451 ** 
## PctEmpAgriculture                              60   1    0.58 0.44821    
## PctEmpMining                                    0   1    0.00 0.98589    
## PctEmpConstruction                            227   1    2.17 0.14060    
## PctEmpManufacturing                             6   1    0.05 0.81835    
## PctEmpTrans                                     1   1    0.01 0.93250    
## UnempRate2012                                  15   1    0.15 0.70224    
## UnempRate2009                                 336   1    3.22 0.07303 .  
## PopChangeRate1819                               8   1    0.08 0.77593    
## NetMigrationRate1019                          388   1    3.72 0.05400 .  
## NaturalChangeRate1019                         600   1    5.75 0.01671 *  
## Net_International_Migration_Rate_2010_2019    128   1    1.23 0.26829    
## NetMigrationRate0010                          259   1    2.48 0.11541    
## NaturalChangeRate0010                         286   1    2.74 0.09808 .  
## Immigration_Rate_2000_2010                     25   1    0.24 0.62643    
## BlackNonHispanicPct2010                       387   1    3.71 0.05451 .  
## AsianNonHispanicPct2010                         1   1    0.01 0.90762    
## NativeAmericanNonHispanicPct2010              194   1    1.86 0.17286    
## MultipleRacePct2010                            28   1    0.27 0.60379    
## NonHispanicBlackPopChangeRate0010             592   1    5.67 0.01743 *  
## NonHispanicAsianPopChangeRate0010             508   1    4.87 0.02758 *  
## HispanicPopChangeRate0010                     365   1    3.50 0.06169 .  
## MultipleRacePopChangeRate0010                  22   1    0.22 0.64263    
## WhiteNonHispanicNum2010                       503   1    4.82 0.02830 *  
## MultipleRaceNum2010                           207   1    1.98 0.15933    
## ForeignBornEuropePct                           85   1    0.81 0.36834    
## ForeignBornMexPct                             300   1    2.88 0.09028 .  
## Ed1LessThanHSPct                              268   1    2.57 0.10931    
## Ed2HSDiplomaOnlyPct                           619   1    5.93 0.01503 *  
## Ed3SomeCollegePct                               0   1    0.00 0.98653    
## Ed4AssocDegreePct                            1264   1   12.12 0.00052 ***
## FemaleHHPct                                  1200   1   11.50 0.00072 ***
## HH65PlusAlonePct                              233   1    2.23 0.13535    
## ForeignBornCaribPct                             0   1    0.00 0.96018    
## ForeignBornAfricaNum                          253   1    2.42 0.11981    
## ForeignBornMexNum                            3091   1   29.63 6.6e-08 ***
## LandAreaSQMiles2010                             7   1    0.07 0.79113    
## Deep_Pov_All                                  115   1    1.10 0.29457    
## Residuals                                  100680 965                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We remove the variables that are the least relevant to see what factors remain.

## 
## Call:
## lm(formula = med.aqi ~ UnempRate2019 + UnempRate2017 + UnempRate2009 + 
##     NetMigrationRate1019 + NonHispanicBlackPopChangeRate0010 + 
##     NonHispanicAsianPopChangeRate0010 + HispanicPopChangeRate0010 + 
##     WhiteNonHispanicNum2010 + MultipleRaceNum2010 + Ed1LessThanHSPct + 
##     Ed2HSDiplomaOnlyPct + Ed4AssocDegreePct + FemaleHHPct + HH65PlusAlonePct + 
##     ForeignBornMexNum, data = lm_cols)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -38.50  -3.29   1.93   5.88  93.96 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        4.16e+01   2.90e+00   14.36  < 2e-16 ***
## UnempRate2019                     -2.37e+00   7.01e-01   -3.39  0.00073 ***
## UnempRate2017                      2.01e+00   6.93e-01    2.90  0.00382 ** 
## UnempRate2009                      5.11e-01   1.43e-01    3.57  0.00038 ***
## NetMigrationRate1019              -1.02e-01   4.88e-02   -2.10  0.03612 *  
## NonHispanicBlackPopChangeRate0010 -9.97e-03   3.52e-03   -2.84  0.00467 ** 
## NonHispanicAsianPopChangeRate0010  1.09e-02   4.27e-03    2.55  0.01090 *  
## HispanicPopChangeRate0010          1.55e-02   5.48e-03    2.83  0.00472 ** 
## WhiteNonHispanicNum2010            4.02e-06   1.36e-06    2.95  0.00323 ** 
## MultipleRaceNum2010               -6.10e-05   2.32e-05   -2.63  0.00874 ** 
## Ed1LessThanHSPct                  -3.14e-01   8.90e-02   -3.53  0.00043 ***
## Ed2HSDiplomaOnlyPct               -1.44e-01   5.32e-02   -2.71  0.00679 ** 
## Ed4AssocDegreePct                 -4.91e-01   1.66e-01   -2.96  0.00311 ** 
## FemaleHHPct                        5.28e-01   1.25e-01    4.21  2.8e-05 ***
## HH65PlusAlonePct                  -5.67e-01   1.43e-01   -3.96  8.2e-05 ***
## ForeignBornMexNum                  5.32e-05   8.55e-06    6.23  6.9e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.2 on 991 degrees of freedom
## Multiple R-squared:  0.201,  Adjusted R-squared:  0.189 
## F-statistic: 16.6 on 15 and 991 DF,  p-value: <2e-16

From the final model, we see that most of the impact on AQI is geographical. For example, the increase from ForeignBornMexNum could signal that states closer to the Mexican border tend to have worse AQIs due to their location. However, the most clear predictors are the states themselves.

The assumptions for linearity appear to hold up until about 1 standard deviation below the mean.

0.5 Conclusion

The overall objective of this study was to use the AQI of counties across the USA to determine the impact of variables on the climate. Using data collected by the EPA, we were able to focus on the effect of the Climate Alliance on curbing the deterioration of the AQI across the nation, as well as the correlation between aspects of counties and their air quality.

From this study, we were able to conclude that the Climate Alliance has not had much of an effect yet on the AQI of member states, but do have better AQIs on average compared to other states. We were also able to see that most of the impact on the AQI is geographical based on the significant variables of the model.